Split on hyphens as well as whitespace #98

nolanlawson · 2014-07-08T19:58:49Z

Example:

Take the New York-San Francisco flight.

"york-san" isn't a word, so it shouldn't be output by the tokenizer.

For precedence, Lucene's standard tokenizer also splits on hyphens, although it doesn't do it for product numbers like 31-5-6, which is not implemented here due to complexity.

olivernn · 2014-07-14T17:17:49Z

Thanks!

nolanlawson · 2014-07-14T17:44:10Z

No prob!

debug64 · 2014-07-25T21:56:56Z

I think the solution has a problem when indexing text like "A - B", it leads to an empty token which results in a broken (not searchable) index.

nolanlawson · 2014-07-26T18:06:03Z

Yup, you're right, that's a bug. Will fix.

olivernn · 2014-08-11T19:18:17Z

version 0.5.5 includes the fix for this issue.

Split on hyphens as well as whitespace

dd9ce22

nolanlawson mentioned this pull request Jul 8, 2014

How to Split words on Hyphen ? pouchdb-community/pouchdb-quick-search#3

Closed

olivernn merged commit dd9ce22 into olivernn:master Jul 14, 2014

nolanlawson mentioned this pull request Jul 26, 2014

split on hyphens and remove empty tokens #101

Closed

Provide feedback